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ABSTRACT. 



We define a class of multivariate exponential distributions as the 
distributions of occupancy times in upwards skip-free Markov processes in 
continuous time. These distributions are infinitely divisible, and the multivariate 
gamma class defined by convolutions and fractions is a substantial generalization of 
the class defined by Johnson & Kotz (1972). Parallel classes of multivariate 
geometric and multivariate negative binomial distributions are constructed from 
occupancy times in 'instant' upwards skip-free Markov chains. Maximum 
likelihood estimation and times series applications are discussed. 
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1. INTRODUCTION. 

The exponential distribution plays a central role in several fields of probability 
and statistics, and ranks in overall importance next to the normal distribution. While for the 
normal case we have a well established multivariate normal distribution, in the exponential 
case the situation is far from clear-cut. A variety of bivariate exponential distributions 
(BVE) have been defined in the past, some of them extendable to higher dimensions and to 
gamma distributions. 

The univariate exponential distribution has a number of important 
characterizations. Multivariate extensions of some of these characterizations were used for 
construction of multivariate exponential distil utions (MVE) by Marshall & Olkin (1967) 
(lack of memory property) and Paulson (1973) (a stochastic difference equation). Standard 

transformation techniques were exploited by Kibble (1941) using a % 2 -type derivation, and 
Moran (1969) using the distribution-function transformation and then the '-log' 
transformation to obtain a MVE from the multivariate normal via a multivariate uniform 
distribution. Gumbel (1960) explored possibilities of defining BVE and MVE classes 
based on the form of the joint distribution and density functions. Arnold (1975) constructed 
nested classes of BVE's by repeated application of geometric compounding. His 
constructions are equally applicable to multivariate geometric distributions (MVG). Wang Zi 
Kun (1980) derived the distributions of occupancy times (sojourn times) in birth-death 
processes; these distributions are MVE. This brief review of related research is not 
exhaustive; our research was motivated by these references. We extend the results of Wang 
Zi Kun to all upwards skip-free Markov processes on {0, 1, 2, ... }, and define a parallel 
MVG class. 

In Section 2 we give the definition of occupancy times and related notation, and 
construct a new class of MVE distributions. We will work mainly with moment generating 
functions (mgf), and we derive for them recursive formulae which involve the matrix of 
transition intensities of the underlying Markov process. Owing to infinite divisibility we 

have also a class of multivariate gamma distributions (MVT)- 

In Section 3 we construct a new class of MVG distributions from occupancy 
times in 'instant' Markov chains. The constructions in Sections 2 and 3 are completely 
analogous, and there is a one-to-one correspondence between our MVE and MVG 
classes, which is also a one-to-one correspondence between the underlying stochastic 
processes. This one-to-one correspondence is an extension of the well-known relationship 
between the mgfs of univariate exponentials and probability generating functions (pgf) 
of geometric distributions: 



O 
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9e( s ) = P p( s+1 ) for © = P' 1 -i, 
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where (p@ is the mgf of the exponential with parameter 0 , 0/(Q-s) , and P p is the 

pgf of the geometric distribution with parameter p, (l-p)/(l-pz). 

In Section 4 we discuss maximum likelihood estimation with our MVE class 
and indicate some time series applications. We propose estimation methods based on the 
mgf because it has a much more tractable form than the density function. Our comments in 
the Section are equally applicable to the MVG class, although there the scope of 
applications is probably limited. 

In Section 5 we discuss an alternative dr*" ration of an MVE class based on the 

generalization of the x 2 2 distribution, and state a conjecture that the occupancy times and 

this generalization define the same class of MVE. The support for this conjecture is the 
equivalence between a prir of subclasses of these distributions, proved by Kent (1982). 



2. OCCUPANCY TIMES. 



Let {ZjJ^o be an upwards skip-free Markov process on the state space of the 
non-negative integers {0, 1,2, ... } in continuous time t>0, given by the matrix of 



transition intensities 



v 0 


Xq 


0 


41 


- Vi 




^2,0 


^2 


- v 2 


^3,0 


^3,1 


4 3 



0 

^ 0 

-v 3 0 . . 



v y 

(2.1) 

wher' kj > 0 (i>0), fij > 0 (i>l), £,^>0 (i>j+2>2), and all the row-totals of Q 
are equal to zero. We denote by Q n the nxn upper left-hand comer submatrix of Q, 
corresponding to the states 0, 1, ..., n-1. Let S n = diag {sq, Sj, . . ., s^} be the 
diagonal matrix with real numbers Sj on the diagonal; they will be subsequently used as the 
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arguments of a joint mgf 

<p(s n ) = J ... J exp { s n ) f(x)dx , (2.2) 

where s n = (sq, Sj, . . s n _i). The index n will be omitted whenever its value is obvious 
from the context. 

The first hitting time from a state k to a state n > k is formally defined as 
x k n = min (t; Z t = n I Z 0 = k} , (2.3) 

and the vector of occupancy times during passage from k to n > k (denoted by T k n ) is 

the decomposition of the first hitting time t k n into the sum of the times that the Markov 
process Z has spent in each of the states 0, 1, n-1: 

T k .„ - (x kn (0),x M (l),...,t M (n-l)), 



where 



T k>n (h)=j I {Zt=h| Z 0 =k} dt 

(1(A) is the indicator function for the event A, and the integral is over the interval 

[0, T k n ] ). 

The joint mgf cp k n (s) for the vector of occupancy times during the passage 

from k to n>k can be derived using a backwards equations argument; the derivations 
below are similar to those for the first hitting times (fht) given by Rosenlund (1977). 
Firstly, owing to the strong Markov property of the process Z we have 




^k,n ^k.k+l + ^k+lJc+2 + • • • + Tn-l,n > 

with mutually independent summands, and trivially T k k = (0, 0, ..., 0). 
for the mgf s we have 



(2.4) 

Correspondingly 



; i 



4 



<Pk,n( s ) = <Pk,k+l< s ) <Pk + l,k+2(s) • • • ’ (2 ' 5) 

and (p kk (s)=l. The backwards equations for the 'one-step' mgf <p k + (s) = cp k 
can be expressed in the form 

(p k+ (s) = (v k -s k )- 1 [^ k + ^ k (p k .i >k+ i(s) + ^ k>k . 2 9 k -2 > k + i(s) + 

• • • + ^k,0 < P0,k+l ( ' S ^’ ( 2 - 6 ) 

where 9 q + ( s ) ” v o/( v cr s o)> an d q = 0- The equation (2.6) can be reexpressed as 



<Pk + (s) = V( v k- s k) t 1 - M v k- s k) <p k .i + (s) 



" ^k,k-2/( v k~ s k) ^-2^^ ~ • • • ' ^k,(A v k * s k) ( Po,k^ l” 1 * 

(2.7) 

which implies that the occupancy times vector T k + = T k k+1 is a convolution of a 
univariate exponential distribution and a geometric compound distribution. Hence 

T k + (tp k +) is infinitely divisible, and owing to (2.5) so are all the occupancy times. 

It is easy to show by induction, using (2.5) and (2.7), that (p k +(s) is a ratio of 
polynomials 

<P k + (s) = X k R k (s)/R k+1 (s), (2.8) 




1 u 



where 



R 0 (s) = 1, 

Ri(s) = v 0 - s 0 , 
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R-2( s ) “ ( v 0 ' s o)( v 1 ' s l) ' ^0 ^1 ’ 

and generally, 

^-k+l( s ) = ( v k ‘ s k)Rk( s ) " M-k ^k-1 ^k-l( s ) ' 9kJc-2 ^k-1 ^k-2 ^k-2( s ) 



which is exactly the expansion for det (Qk+rSk+i) with respect to the bottom row. 
Hence 



for n >1. Asa by-product we have the identity det(-Q n ) = Xq Xj ... X. n .j. 

The vectors of occupancy times during passage from 0 to n define our class of 
n-variate exponential distributions. Their mgf s have the form 



where the polynomials R n , linear in the variables Sq, Sj, . . ., Sn.j , are generated 
recursively by (2.9). 

The versions of the identities (2.8) and (2.9) for the birth- death process (all 

j equal to 0) were obtained by Wang Zi Kun (1980). 

The bivariate exponential distribution generated by occupancy times during 
passage from 0 to 2 has the mgf 




(2.9) 



R n (s) = det (Q n - S n ) 



( 2 . 10 ) 



( P0,n( s ) = ^0 V" Vl^nCs) , 



( 2 . 11 ) 



Xj / R2 (s) — Xq Xj / [(Vq - Sq)(V j - Sj) - Xq |Ij] , 



and the joint density 
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X 0 X y exp (-v 0 x 0 - VjXj) Lq(X q p h x 0 xO 



( 2 . 12 ) 
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where 

L 0 (x) = I k xk/(k!)2. 

The distribution (2.12) has been previously defined by Downton (1970), and in a more 
general context by Kibble (1941). 

We define for h > 0 



L h (x) = I k x k /k! r(k+h+l), (2.13) 

which is an analytic version of the Besel function of order h. The bivariate gamma 
distribution (BVT) with scale a>0 corresponding to (2.1 1) has the density 

. (^^(xox^- 1 exp (-v 0 x 0 - VjXi) Lo&oHjXoxO . (2.14) 



The BVE distribution (2.11) has the mean {(1 + JXjAjJAo* lAj) and correlation 

(ij/Vje [0,1). No correlation corresponds to independence. The conditional exponential 
distributions defined from (2.11) by conditioning on xq or x^ have linear regressions: 



E(Xq I X^= x^) = X 0 -l(l + M-i x i) 
and 



E(X 1 | X 0 = x 0 ) = Vj-l(l +X 0 |i 1 x 1 /v 1 ). 



The occupancy times during passage from 0 to n>2 in birth-death processes form a 
conditionally independent sequence and their joint density can be partitioned into a product 
of conditional exponential densities, see Johnson & Kotz (1972) or Longford (1982). Such 
a sequence can be used to model an AR(1) times series, although the innovation 
distribution is difficult to describe. 

The general trivariate exponential density has the form 

A^j^exp (-v 0 x 0 -v I .t r v 2 x 2 ) 





i 
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x Ik Lk(^o4’ v ''') L k (X 1 |I 2 x 1 x 2 )(X 0 X 1 ^ 2i o x O x l x 2) lc / k! • (2-15) 

The proof is given in the Appendix. Clearly this and densities for higher dimensions are not 
suitable for direct maximum likelihood estimation. An alternative approach to MLE is 
discussed in Section 4. 

All bivariate marginals of the trivariate exponential (2.13) belong to the BVE 
class. The correlation matrix for the trivariate exponential is 

{ 1 1 

l-XjXjt 1 

k l-v^t 1-A.j /v 2 1 j 



and the means are ( 1 /(Xq^ 1 ^ 2 t), v 2^i^ 2’ ^2 1 » where x = l/(VjV 2 - X.(l 2 ) . 



3. MULTIVARIATE GEOMETRIC CLASS. 

The constructions of the MVE class from occupancy times in Markov processes 
have their obvious analogues for discrete distributions in occupancy times in Markov 
chains. For example, the birth-death process has its analogue in the Markov chain which 
allows jumps only one step up or down (discrete random walk, skip-free in both 
directions). However, the distributions of the occupancy times in such Markov chains have 
a more complex structure than their birth-death analogues. Rather than give an example we 



return to this point in the conclusion of this Section. 


Let 










*0 


u 0 


0 




di 


a l 


uj 0 




r 2,0 


d 2 


a 2 u 2 


A = 


r 3,0 


r 3, 1 


d 3 a 3 
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be a matrix of transition probabilities of an upwards skip-free Markov chain (all entries 
non-negative, u, > 0 for all i, row totals equal to 1). In analogy with Section 2 we denote by A n 
the nxn upper left-hand comer submatrix of A, and define the first hitting times and occupancy 
times vectors. 

The occupancy time in a state k during passage from 0 to n (0 < k < n) is a 
compound distribution of the individual waiting times in the state which are geometric starting at 1. 
To obtain a geometric distribution we could either subtract the constant 1 from the occupancy time, 
or subtract 1 from every waiting time. We choose the latter option, and refer to the underlying 
Markov process as the 'instant' Markov chain. In this Markov chain waiting times can be equal 
to zero with positive probability, and so a sequence of states can be visited within the same 
time-instant. Our main motivation for this definition of a Markov process is to construct a class of 
multivariate geometric distributions (MVG) with analogous structure to the MVE class defined 
in Section 2. 

For the first hitting times and the occupancy times in instant Markov chains given by the 
probability transition matrix A we use the notation identical to that introduced in Section 2., T k n 

or T k +, and T kn or T k +, respectively. The vector z = (z q, Z\, z 2 , . . . ) will be used as the 
argument in the probability generating functions (pgf) P k>n for the occupancy times vectors: 

P k>n (z) = E (zTkji) . 



The formula (2.5) has a direct analogue in 

P k,n( z ) = P k + ( z ) P k+l + ( z ) ••• P n-l + ( z ) > (3.1) 

where P h + (z) = Ph,h+i( z )> and the backwards equations yield 

p k + ( z ) = 



(1 - a k z k )-l [u k + d k P k . u+1 (z) + r k(k _ 2 P k - 2 ,k+i( z ) + • • • + r k ,o p 0,k+l( z )] 



P k + ( z ) = 





with the solution in a recursive form 



(3.2) 
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u k /(l - a k z k ) [1 - d k /(l - a k z k ) P k _!+(z) - r kJc _ 2 /(l - a k z k ) P k _ 2>k ( z ) + 



... +r k , 0 /(l-a k z k )Po 4c (z)]-l, (3.3) 

P 0 +(z) = (1 - ao)/(l - a^) . 

The formula (3.3) is a convolution of a univariate geometric and a geometric compound 
distribution. This, together with (3.1), implies infinite divisibility of all occupancy times 
distributions. We declare the class of all distributions generated by occupancy times during passage 
from 0 to n as our MVG class. This definition can be extended to the class of multivariate 
negative binomial distributions (MVNb) in the obvious way. 

The identities (3.2) and (3.3), compared with (2.6) and (2.7) indicate a one-to-one 
correspondence between occupancy times in continuous and discrete processes. Moreover, we 
have a cne-to-one correspondence between the underlying processes: 



If u k = X k / v k (k>0) 

d k = lV v k ( ksl ) 

r k,h = ^kf/ v k (k > h+2 > 2) 

then 



9k^( s ) = P k,n(s+D . (3.4) 

where 1 = (1, 1, ... ). This one-to-one correspondence between the MVE and MVG classes is 
the natural extension of the one-to-one correspondence for the univariate exponential and geometric 
distributions. 

In complete analogy with the continuous case we obtain the identity 
P 0) n(z) = uqU! ... u n . t /T n (z) , (3.5) 

where T n are polynomials linear in zq, zj, ..., z n _i, generated by the recursive formula 



~ G a n z n) rp n( z ) ' d n u n-lTn-l( z ) “ r n,n-2 u n-2 u n-lTn-2( z ) " 



- r n 0 u 0 uj . . . u n .j 



with T 0 (z) = 1 and Tj(z) = 1 - uqzq . It is easy to show by induction that 

O t ' 
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(3.6) 
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T„(z) = det {I n - A n (z)} , 



(3.7) 



where A n (z) is formed from the matrix A n by replacing its diagonal elements a^ by a^zp, 
(0<k<n), and is I n the nxn unit matrix. 

For the bivariate and trivariate geometric distributions the joint probabilities and moments 
(correlations) can be obtained by standard methods, in complete analogy with the exponential case. 
For higher dimensions the formulae are not tractable. 

The backwards equations for the occupancy times in classical Markov chains also define 
a class of MYG distributions (and are infinitely divisible), but the one-to-one correspondence with 
our MVE has a substantially more complex and less natural form. Even the distributions of the 
first hitting times in Markov chains have a substantially more complex structure than the fht's in 
continuous time; for details see Kent & Longford (1983). 



4. MAXIMUM LIKELIHOOD ESTIMATION 
AND TIME SERIES APPLICATIONS. 



Maximum likelihood estimation for the BVE and BVT given by the densities (2.12) 
and (2.14), respectively, can be efficiently carried out by application of standard numerical 
methods using some well-known recursive formulae for computation of Bessel functions and ratios 
of Bessel functions. 

Since dL^(x)/dx = L^+^x), the derivatives of the log-likelihood involving the 
bivariate densities of the form (2.14) involve ratios of Bessel functions, L^+^xJ/L^x). Efficient 
recursive algorithms for calculation of such ratios were derived by Amos ( 1974); other useful 
identities are given in Abramowicz & Stegun (1972). The natural parameter space for the BVE is 

not an open space because of the boundary fij = 0. It is easy to show that the maximum likelihood 

estimate of (ij is positive if and only if the sample covariance N _1 Xj *0iXii - xo*l is positive, 
see Longford (1982) where other numerical details are discussed. 

The MVE class generated as the occupancy times vectors from birth-death processes 
have conditionally independent components, and they can be used for modelling of exponential 
AR(1) time series. Since the likelihood for such a time series factors into univariate conditional 
exponential densities, direct maximum likelihood is feasible. 

The form of the density of the general trivariate exponential distribution renders standard 
maximum likelihood methods impossible, even though the corresponding mgf s have a very 




n; 
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simple structure. Feuerverger and McDonough (1981) have developed procedures for maximum 
likelihood estimation based on the empirical mgf and proved that these procedures can be 
'fine-tuned' to arbitrarily high relative efficiency, given some information about the estimated 
parameters. Their methods appear to be tailor-made for our classes of multivariate distributions 

(MVE, MVr, MVG, and MVNb) because they offer a unified approach to estimation in all 
these classes with generating functions of similar functional form. The main practical point in 
application of the methods of Feuerverger & McDonough is in determining the number and location 
of the points in which the mgf/pgf would be approximated. These issues could be explored in 
the special case of BVE where direct maximum likelihood estimation is available. It is not clear 
though to what extent these results could be generalized to MVE. Of course, the moment method 
of estimation is another tractable option, owing to the simple form of the mgf/pgf. 

The MVE class of n-variate distributions (n>2) has the subclass of n independent 
univariate exponentials and the larger subclass of the distributions with conditionally independent 
components (AR(1), generated from birth-death processes). It appears natural to define a whole 
set of nested classes of MVE distributions by allowing the generating Markov process to have the 
first 2, 3, ..., n-1 non- zero subdiagonals in the transition intensities matrix Q. If Q has only the 
first subdiagonal non- zero, we have an AR(1) time series. We conjecture that if the first k 
subdiagonals are non-zero then the resulting MVE has an AR(k) structure, i.e., it forms a 
k-step conditionally independent sequence: and Z k+ h + i are conditionally independent, given 

Z h+1 , Z h+2 , . . ., Z k 4 . h . Definition of these subclasses of MVE imposes a structure upon the 
entire MVE class that could be used for description of the complexity of the correlation structure 
of an exponential time series or a multivariate sample from MVE. 



5. MVE AS A GENERALIZED CHI-SQUARE DISTRIBUTION. 

Let X 1 = (X 11 ,X 12 ,...,X ln ) and X 2 = (X 21 , X 22 , . . ., X 2n ) be a pair of 
independent and identically distributed normal random vectors with mean 0 and variance matrix 

Q. Then the random vector Y= (Yj, Y 2 , . . ., Y n ) given by Y k = X lk 2 + x 2k 2 defines an 
n-variate exponential distribution. The original idea for this definition is due to Kibble (1941). We 

will refer to this derivation as the generalized x 2 2 . ^ is easy tc show that the mgf for Y is 



det (1/2 Q' 1 ) / det (1/2 Q' 1 - S n ) , 

which closely resembles the functional form of our MVE, see (2.11) and (2.10). Kent (1982) 
has in fact proved that the subclass of our MVE class arising from birth-death processes coincides 




12 



with the subclass of the generalized distributions derived from variance matrices ft for which 
ft" 1 is tridiagonal. 

An obvious extension of this identity is the following conjecture: The distributions of the 

MVE with AR(k) structure (as defined in Section 4) coincide with the generalized X 2 2 

distributions derived from variance matrices ft such that ft' 1 have k non-zero rows below and 
above the main diagonal. The proof of Kent (1982) cannot be extended for this general 
proposition, and we do not have an alternative method of proof. 
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APPENDIX 



The density of trivariate exponential distribution 



The mgf of the trivariate exponential distribution is 

X Q X 1 X 2 

(^0 “ s o)(' )f l " s l)(’ )r 2 ~ s 2) ~ x 0Pl( r 2 " s 2^ ” ^lV l 2( ir 0 “ s o) “ S2,0 X 0 X 1 



X 0 X 1 X 2 

(^0 ” s o) ("U " s l)( r 2 " s 2^ 



Jo ‘ r i - S 1)' 



r x 0 “ l 



X 1^2 



$2,0 X 0 X 1 



r 0 _s 0 ir 2 _s 2 (T r 0~ s oKT r 2 _s 2) 



= X 0 X 1 X 2 III 



(k 1 +k 2 +k 3 ) ! 
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The summands above are independent Gammas, and the corresponding density is 
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Note that if £ 2j q = 0 (conditional independence), this density collaspes to 
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